Itemset Size - Sensitive Interestingness Measures for Association Rule Mining and Link Prediction
نویسنده
چکیده
Association rule learning is a data mining technique that can capture relationships between pairs of entities in different domains. The goal of this research is to discover factors from data that can improve the precision, recall, and accuracy of association rules found using interestingness measures and frequent itemset mining. Such factors can be calibrated using validation data and applied to rank candidate rules in domaindependent tasks such as link existence prediction. In addition, I use interestingness measures themselves as numerical features to improve link existence prediction. The focus of this dissertation is on developing and testing an analytical framework for association rule interestingness measures, to make them sensitive to the relative size of itemsets. I survey existing interestingness measures and then introduce adaptive parametric models for normalizing and optimizing these measures, based on the size of itemsets containing a candidate pair of co-occurring entities. The central thesis of this work is that in certain domains, the link strength between entities is related to the rarity of their shared memberships (i.e., the size of itemsets in which they co-occur), and that a data-driven approach can capture such properties by normalizing the quantitative measures used to rank associations. To test this hypothesis under different levels of variability in itemset size, I develop several test bed domains, each containing an association rule mining task and a link existence prediction task. The definitions of itemset membership and link existence in each domain depend on its local semantics. My primary goals are: to capture quantitative aspects of these local semantics in normalization factors for association rule interestingness measures; to represent these factors as quantitative features for link existence prediction, to apply them to significantly improve precision and recall in several real-world domains; and to build an experimental framework for measuring this improvement, using information theory and classification-based validation. ITEMSET SIZE-SENSITIVE INTERESTINGNESS MEASURES FOR ASSOCIATION RULE MINING AND LINK PREDICTION
منابع مشابه
Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm
Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...
متن کاملReduction of Number of Association Rules with Inter Itemset Distance in Transaction Databases
Association Rule discovery has been an important problem of investigation in knowledge discovery and data mining. An association rule describes associations among the sets of items which occur together in transactions of databases.The Association Rule mining task consists of finding the frequent itemsets and the rules in the form of conditional implications with respect to some prespecified thr...
متن کاملAssociation Rule Generation and Evaluation of Interestingness Measures for Artwork Tags
Finding associations between groups of items within a data set is referred to as the problem of association rule mining. This can be decomposed into the two subproblems frequent itemset mining and association rule generation. The open source data mining framework ELKI used throughout this work for experiments so far only supports the first subproblem. Therefore the implementation of association...
متن کاملAn Efficient Technique for Frequent Itemset Generation Using the Significance Degree of Items
Mining association rules is one of the most important tasks in data mining. The classical model of association rules mining is supportconfidence. The support-confidence model concentrates only on the existence or absence of an item in transaction records and does not take into account the products’ prices and quantities and how such these detailed information can affect the overall performance ...
متن کاملCombined Association Rule Mining
This paper proposes an algorithm to discover novel association rules, combined association rules. Compared with conventional association rule, this combined association rule allows users to perform actions directly. Combined association rules are always organized as rule sets, each of which is composed of a number of single combined association rules. These single rules consist of non-actionabl...
متن کامل